124 research outputs found
Boosting XML Filtering with a Scalable FPGA-based Architecture
The growing amount of XML encoded data exchanged over the Internet increases
the importance of XML based publish-subscribe (pub-sub) and content based
routing systems. The input in such systems typically consists of a stream of
XML documents and a set of user subscriptions expressed as XML queries. The
pub-sub system then filters the published documents and passes them to the
subscribers. Pub-sub systems are characterized by very high input ratios,
therefore the processing time is critical. In this paper we propose a "pure
hardware" based solution, which utilizes XPath query blocks on FPGA to solve
the filtering problem. By utilizing the high throughput that an FPGA provides
for parallel processing, our approach achieves drastically better throughput
than the existing software or mixed (hardware/software) architectures. The
XPath queries (subscriptions) are translated to regular expressions which are
then mapped to FPGA devices. By introducing stacks within the FPGA we are able
to express and process a wide range of path queries very efficiently, on a
scalable environment. Moreover, the fact that the parser and the filter
processing are performed on the same FPGA chip, eliminates expensive
communication costs (that a multi-core system would need) thus enabling very
fast and efficient pipelining. Our experimental evaluation reveals more than
one order of magnitude improvement compared to traditional pub/sub systems.Comment: CIDR 200
Designing Access Methods for Bitemporal Databases
By supporting the valid and transaction time dimensions, bitemporal
databases represent reality more accurately than conventional
databases. In this paper we examine the issues involved in designing
efficient access methods for bitemporal databases and propose the
partial-persistence and the double-tree methodologies. The partial-
persistence methodology reduces bitemporal queries to partial
persistence problems for which an efficient access method is then
designed. The double-tree methodology "sees" each bitemporal data
object as consisting of two intervals (a valid-time and a transaction-
time interval), and divides objects into two categories according to
whether the right endpoint of the transaction time interval is already
known. A common characteristic of both methodologies is that they
take into account the properties of each time dimension. Their
performance is compared with a straightforward approach that
"sees" the intervals associated with a bitemporal object as
composing one rectangle which is stored in a single
multidimensional access method. Given that some limited additional
space is available, our experimental results show that the partial-
persistence methodology provides the best overall performance,
especially for transaction timeslice queries. For those applications
that require ready, off-the-shelf, access methods the double-tree
methodology is a good alternative.
(Also cross-referenced as UMIACS-TR-97-24
Querying Spatio-temporal Patterns in Mobile Phone-Call Databases
Abstract — Call Detail Record (CDR) databases contain millions of records with information about cell phone calls, including the position of the user when the call was made/received. This huge amount of spatiotemporal data opens the door for the study of human trajectories on a large scale without the bias that other sources (like GPS or WLAN networks) introduce in the population studied. Also, it provides a platform for the development of a wide variety of studies ranging from the spread of diseases to planning of public transport. Nevertheless, previous work on spatiotemporal queries does not provide a framework flexible enough for expressing the complexity of human trajectories. In this paper we present the Spatiotemporal Pattern System (STPS) to query spatiotemporal patterns in very large CDR databases. STPS defines a regular-expression query language that is intuitive and that allows for any combination of spatial and temporal predicates with constraints, including the use of variables. The design of the language took into consideration the layout of the areas being covered by the cellular towers, as well as “areas ” that label places of interested (e.g. neighborhoods, parks, etc) and topological operators. STPS includes an underlying indexing structure and algorithms for query processing using different evaluation strategies. A full implementation of the STPS is currently running with real, very large CDR databases on Telefónica Research Labs. An extensive performance evaluation of the STPS shows that it can efficiently find complex mobility patterns in large CDR databases. I
AsterixDB: A Scalable, Open Source BDMS
AsterixDB is a new, full-function BDMS (Big Data Management System) with a
feature set that distinguishes it from other platforms in today's open source
Big Data ecosystem. Its features make it well-suited to applications like web
data warehousing, social data storage and analysis, and other use cases related
to Big Data. AsterixDB has a flexible NoSQL style data model; a query language
that supports a wide range of queries; a scalable runtime; partitioned,
LSM-based data storage and indexing (including B+-tree, R-tree, and text
indexes); support for external as well as natively stored data; a rich set of
built-in types; support for fuzzy, spatial, and temporal types and queries; a
built-in notion of data feeds for ingestion of data; and transaction support
akin to that of a NoSQL store.
Development of AsterixDB began in 2009 and led to a mid-2013 initial open
source release. This paper is the first complete description of the resulting
open source AsterixDB system. Covered herein are the system's data model, its
query language, and its software architecture. Also included are a summary of
the current status of the project and a first glimpse into how AsterixDB
performs when compared to alternative technologies, including a parallel
relational DBMS, a popular NoSQL store, and a popular Hadoop-based SQL data
analytics platform, for things that both technologies can do. Also included is
a brief description of some initial trials that the system has undergone and
the lessons learned (and plans laid) based on those early "customer"
engagements
High performance FPGA and GPU complex pattern matching over spatio-temporal streams
The wide and increasing availability of collected data in the form of trajectories has led to research advances in behavioral aspects of the monitored subjects (e.g., wild animals, people, and vehicles). Using trajectory data harvested by devices, such as GPS, RFID and mobile devices, complex pattern queries can be posed to select trajectories based on specific events of interest. In this paper, we present a study on FPGA- and GPU-based architectures processing complex patterns on streams of spatio-temporal data. Complex patterns are described as regular expressions over a spatial alphabet that can be implicitly or explicitly anchored to the time domain. More importantly, variables can be used to substantially enhance the flexibility and expressive power of pattern queries. Here we explore the challenges in handling several constructs of the assumed pattern query language, with a study on the trade-offs between expressiveness, scalability and matching accuracy. We show an extensive performance evaluation where FPGA and GPU setups outperform the current state-of-the-art (single-threaded) CPU-based approaches, by over three orders of magnitude for FPGAs (for expressive queries) and up to two orders of magnitude for certain datasets on GPUs (and in some cases slowdown). Unlike software-based approaches, the performance of the proposed FPGA and GPU solutions is only minimally affected by the increased pattern complexity
- …